Search CORE

18 research outputs found

A Formal Model For Real-Time Parallel Computation

Author: Chikkagoudar Satish
Hui Peter
Publication venue: 'Open Publishing Association'
Publication date: 01/12/2012
Field of study

The imposition of real-time constraints on a parallel computing environment- specifically high-performance, cluster-computing systems- introduces a variety of challenges with respect to the formal verification of the system's timing properties. In this paper, we briefly motivate the need for such a system, and we introduce an automaton-based method for performing such formal verification. We define the concept of a consistent parallel timing system: a hybrid system consisting of a set of timed automata (specifically, timed Buchi automata as well as a timed variant of standard finite automata), intended to model the timing properties of a well-behaved real-time parallel system. Finally, we give a brief case study to demonstrate the concepts in the paper: a parallel matrix multiplication kernel which operates within provable upper time bounds. We give the algorithm used, a corresponding consistent parallel timing system, and empirical results showing that the system operates under the specified timing constraints.Comment: In Proceedings FTSCS 2012, arXiv:1212.657

arXiv.org e-Print Archive

Directory of Open Access Journals

Algorithms in comparative genomics

Author: Chikkagoudar Satish
Publication venue: Digital Commons @ NJIT
Publication date: 31/01/2010
Field of study

The field of comparative genomics is abundant with problems of interest to computer scientists. In this thesis, the author presents solutions to three contemporary problems: obtaining better alignments for phylogeny reconstruction, identifying related RNA sequences in genomes, and ranking Single Nucleotide Polymorphisms (SNPs) in genome-wide association studies (GWAS). Sequence alignment is a basic and widely used task in bioinformatics. Its applications include identifying protein structure, RNAs and transcription factor binding sites in genomes, and phylogeny reconstruction. Phylogenetic descriptions depend not only on the employed reconstruction technique, but also on the underlying sequence alignment. The author has studied and established a simple prescription for obtaining a better phylogeny by improving the underlying alignments used in phylogeny reconstruction. This was achieved by improving upon Gotoh\u27s iterative heuristic by iterating with maximum parsimony guide-trees. This approach has shown an improvement in accuracy over standard alignment programs. A novel alignment algorithm named Probalign-RNAgenome that can identify non-coding RNAs in genomic sequences was also developed. Non-coding RNAs play a critical role in the cell such as gene regulation. It is thought that many such RNAs lie undiscovered in the genome. To date, alignment based approaches have shown to be more accurate than thermodynamic methods for identifying such non-coding RNAs. Probalign-RNAgenome employs a probabilistic consistency based approach for aligning a query RNA sequence to its homolog in a genomic sequence. Results show that this approach is more accurate on real data than the widely used BLAST and Smith- Waterman algorithms. Within the realm of comparative genomics are also a large number of recently conducted GWAS. GWAS aim to identify regions in the genome that are associated with a given disease. The support vector machine (SVM) provides a discriminative alternative to the widely used chi-square statistic in GWAS. A novel hybrid strategy that combines the chi-square statistic with the SVM was developed and implemented. Its performance was studied on simulated data and the Wellcome Trust Case Control Consortium (WTCCC) studies. Results presented in this thesis show that the hybrid strategy ranks causal SNPs in simulated data significantly higher than the chi-square test and SVM alone. The results also show that the hybrid strategy ranks previously replicated SNPs and associated regions (where applicable) of type 1 diabetes, rheumatoid arthritis, and Crohn\u27s disease higher than the chi-square, SVM, and SVM Recursive Feature Elimination (SVM-RFE)

Digital Commons @ New Jersey Institute of Technology (NJIT)

eProbalign: generation and manipulation of multiple sequence alignments using partition function posterior probabilities

Author: Chikkagoudar Satish
Livesay Dennis
Roshan Usman
Publication venue: Oxford University Press
Publication date
Field of study

Probalign computes maximal expected accuracy multiple sequence alignments from partition function posterior probabilities. To date, Probalign is among the very best scoring methods on the BAliBASE, HOMSTRAD and OXBENCH benchmarks. Here, we introduce eProbalign, which is an online implementation of the approach. Moreover, the eProbalign web server doubles as an online platform for post-alignment analysis. The heart-and-soul of the post-alignment functionality is the Probalign Alignment Viewer applet, which provides users a convenient means to manipulate the alignments by posterior probabilities. The viewer can also be used to produce graphical and text versions of the output. The eProbalign web server and underlying Probalign source code is freely accessible at http://probalign.njit.ed

Crossref

PubMed Central

Searching for evolutionary distant RNA homologs within genomic sequences using partition function posterior probabilities

Author: Chikkagoudar Satish
Livesay Dennis R
Roshan Usman
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Identification of RNA homologs within genomic stretches is difficult when pairwise sequence identity is low or unalignable flanking residues are present. In both cases structure-sequence or profile/family-sequence alignment programs become difficult to apply because of unreliable RNA structures or family alignments. As such, local sequence-sequence alignment programs are frequently used instead. We have recently demonstrated that maximal expected accuracy alignments using partition function match probabilities (implemented in Probalign) are significantly better than contemporary methods on heterogeneous length protein sequence datasets, thus suggesting an affinity for local alignment. Results We create a pairwise RNA-genome alignment benchmark from RFAM families with average pairwise sequence identity up to 60%. Each dataset contains a query RNA aligned to a target RNA (of the same family) embedded in a genomic sequence at least 5K nucleotides long. To simulate common conditions when exact ends of an ncRNA are unknown, each query RNA has 5' and 3' genomic flanks of size 50, 100, and 150 nucleotides. We subsequently compare the error of the Probalign program (adjusted for local alignment) to the commonly used local alignment programs HMMER, SSEARCH, and BLAST, and the popular ClustalW program with zero end-gap penalties. Parameters were optimized for each program on a small subset of the benchmark. Probalign has overall highest accuracies on the full benchmark. It leads by 10% accuracy over SSEARCH (the next best method) on 5 out of 22 families. On datasets restricted to maximum of 30% sequence identity, Probalign's overall median error is 71.2% vs. 83.4% for SSEARCH (P-value < 0.05). Furthermore, on these datasets Probalign leads SSEARCH by at least 10% on five families; SSEARCH leads Probalign by the same margin on two of the fourteen families. We also demonstrate that the Probalign mean posterior probability, compared to the normalized SSEARCH Z-score, is a better discriminator of alignment quality. All datasets and software are available online. Conclusion We demonstrate, for the first time, that partition function match probabilities used for expected accuracy alignment, as done in Probalign, provide statistically significant improvement over current approaches for identifying distantly related RNA sequences in larger genomic segments.</p

Directory of Open Access Journals

PubMed Central

Malware Detection Using Frequency Domain-Based Image Visualization and Deep Learning

Author: Chandrasekaran Shivkumar
Chikkagoudar Satish
Manjunath B.S.
Mohammed Tajuddin Manhar
Nataraj Lakshmanan
Publication venue: AIS Electronic Library (AISeL)
Publication date: 04/01/2021
Field of study

We propose a novel method to detect and visualize malware through image classification. The executable binaries are represented as grayscale images obtained from the count of N-grams (N=2) of bytes in the Discrete Cosine Transform (DCT) domain and a neural network is trained for malware detection. A shallow neural network is trained for classification, and its accuracy is compared with deep-network architectures such as ResNet that are trained using transfer learning. Neither dis-assembly nor behavioral analysis of malware is required for these methods. Motivated by the visual similarity of these images for different malware families, we compare our deep neural network models with standard image features like GIST descriptors to evaluate the performance. A joint feature measure is proposed to combine different features using error analysis to get an accurate ensemble model for improved classification performance. A new dataset called MaleX which contains around 1 million malware and benign Windows executable samples is created for large-scale malware detection and classification experiments. Experimental results are quite promising with 96% binary classification accuracy on MaleX. The proposed model is also able to generalize well on larger unseen malware samples and the results compare favorably with state-of-the-art static analysis-based malware detection algorithms

arXiv.org e-Print Archive

ScholarSpace at University of Hawai'i at Manoa

AIS Electronic Library (AISeL)

MalGrid: Visualization Of Binary Features In Large Malware Corpora

Author: Chandrasekaran Shivkumar
Chikkagoudar Satish
Manjunath B. S.
Mohammed Tajuddin Manhar
Nataraj Lakshmanan
Publication venue
Publication date: 04/11/2022
Field of study

The number of malware is constantly on the rise. Though most new malware are modifications of existing ones, their sheer number is quite overwhelming. In this paper, we present a novel system to visualize and map millions of malware to points in a 2-dimensional (2D) spatial grid. This enables visualizing relationships within large malware datasets that can be used to develop triage solutions to screen different malware rapidly and provide situational awareness. Our approach links two visualizations within an interactive display. Our first view is a spatial point-based visualization of similarity among the samples based on a reduced dimensional projection of binary feature representations of malware. Our second spatial grid-based view provides a better insight into similarities and differences between selected malware samples in terms of the binary-based visual representations they share. We also provide a case study where the effect of packing on the malware data is correlated with the complexity of the packing algorithm.Comment: Submitted version - MILCOM 2022 IEEE Military Communications Conference. The high-quality images in this paper can be found on Github (https://github.com/Mayachitra-Inc/MalGrid

arXiv.org e-Print Archive

Recommended from our members

Comparative hazard analysis and toxicological modeling of diverse nanomaterials using the embryonic zebrafish (EZ) metric of toxicity

Author: Baker Nathan
Chikkagoudar Satish
Harper Bryan
Harper Stacey
Heredia-Langner Alejandro
Lins Roberto
Tang Kaizhi
Thomas Dennis
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date
Field of study

The integration of rapid assays, large datasets, informatics, and modeling can overcome current barriers in understanding nanomaterial structure–toxicity relationships by providing a weight-of-the-evidence mechanism to generate hazard rankings for nanomaterials. Here, we present the use of a rapid, low-cost assay to perform screening-level toxicity evaluations of nanomaterials in vivo. Calculated EZ Metric scores, a combined measure of morbidity and mortality in developing embryonic zebrafish, were established at realistic exposure levels and used to develop a hazard ranking of diverse nanomaterial toxicity. Hazard ranking and clustering analysis of 68 diverse nanomaterials revealed distinct patterns of toxicity related to both the core composition and outermost surface chemistry of nanomaterials. The resulting clusters guided the development of a surface chemistry-based model of gold nanoparticle toxicity. Our findings suggest that risk assessments based on the size and core composition of nanomaterials alone may be wholly inappropriate, especially when considering complex engineered nanomaterials. Research should continue to focus on methodologies for determining nanomaterial hazard based on multiple sub-lethal responses following realistic, low-dose exposures, thus increasing the availability of quantitative measures of nanomaterial hazard to support the development of nanoparticle structure–activity relationships.This is the publisher’s final pdf. The published article is copyrighted by the author(s) and published by Springer. The published article can be found at: http://link.springer.com/journal/11051Keywords: Zebrafish, Nanoparticle, Surface chemistry, Informatics, ToxicityKeywords: Zebrafish, Nanoparticle, Surface chemistry, Informatics, Toxicit

ScholarsArchive@OSU

GENIE: a software package for gene-gene interaction analysis in genetic association studies using multiple GPU or CPU cores

Author: BJ Keating
H Zhou
HJ Cordell
J He
J Marchini
JE Stone
Kai Wang
L Dematte
MC Schatz
Mingyao Li
NA Davis
S Purcell
Satish Chikkagoudar
T Schupbach
VW Lee
Publication venue: BioMed Central
Publication date: 01/05/2011
Field of study

Abstract Background Gene-gene interaction in genetic association studies is computationally intensive when a large number of SNPs are involved. Most of the latest Central Processing Units (CPUs) have multiple cores, whereas Graphics Processing Units (GPUs) also have hundreds of cores and have been recently used to implement faster scientific software. However, currently there are no genetic analysis software packages that allow users to fully utilize the computing power of these multi-core devices for genetic interaction analysis for binary traits. Findings Here we present a novel software package GENIE, which utilizes the power of multiple GPU or CPU processor cores to parallelize the interaction analysis. GENIE reads an entire genetic association study dataset into memory and partitions the dataset into fragments with non-overlapping sets of SNPs. For each fragment, GENIE analyzes: 1) the interaction of SNPs within it in parallel, and 2) the interaction between the SNPs of the current fragment and other fragments in parallel. We tested GENIE on a large-scale candidate gene study on high-density lipoprotein cholesterol. Using an NVIDIA Tesla C1060 graphics card, the GPU mode of GENIE achieves a speedup of 27 times over its single-core CPU mode run. Conclusions GENIE is open-source, economical, user-friendly, and scalable. Since the computing power and memory capacity of graphics cards are increasing rapidly while their cost is going down, we anticipate that GENIE will achieve greater speedups with faster GPU cards. Documentation, source code, and precompiled binaries can be downloaded from <url>http://www.cceb.upenn.edu/~mli/software/GENIE/</url>.</p

Crossref

Directory of Open Access Journals

PubMed Central

Ranking causal variants and associated regions in genome-wide association studies by the support vector machine and random forest

Author: Alpaydin
Ban
Boulesteix
Breiman
Calle
Chanda
Chen
Diaz-Uriarte
Durrant
Evans
Fan
Gail
Gillespie
Guyon
Guyon
Hakon Hakonarson
Hardin
Hochberg
Hoggart
Hulbert
Jewell
Joachims
Kai Wang
Li
Li
Li
Li
Mao
Meng
Mueller
Niijima
Pearson
Satish Chikkagoudar
Schwarz
Schölkopf
Smith
Statnikov
Statnikov
Stromberg
Teo
Usman Roshan
Vapnik
Wei
Wei
Wray
Wu
Zhang
Zheng
Zhi Wei
Publication venue: Oxford University Press
Publication date
Field of study

We study the number of causal variants and associated regions identified by top SNPs in rankings given by the popular 1 df chi-squared statistic, support vector machine (SVM) and the random forest (RF) on simulated and real data. If we apply the SVM and RF to the top 2r chi-square-ranked SNPs, where r is the number of SNPs with P-values within the Bonferroni correction, we find that both improve the ranks of causal variants and associated regions and achieve higher power on simulated data. These improvements, however, as well as stability of the SVM and RF rankings, progressively decrease as the cutoff increases to 5r and 10r. As applications we compare the ranks of previously replicated SNPs in real data, associated regions in type 1 diabetes, as provided by the Type 1 Diabetes Consortium, and disease risk prediction accuracies as given by top ranked SNPs by the three methods. Software and webserver are available at http://svmsnps.njit.edu

Crossref

PubMed Central